10 research outputs found
ERROR ANALYSIS AND PROPAGATION IN METABOLOMICS DATA ANALYSIS
Error analysis plays a fundamental role in describing the uncertainty in experimental results. It has several fundamental uses in metabolomics including experimental design, quality control of experiments, the selection of appropriate statistical methods, and the determination of uncertainty in results. Furthermore, the importance of error analysis has grown with the increasing number, complexity, and heterogeneity of measurements characteristic of ‘omics research. The increase in data complexity is particularly problematic for metabolomics, which has more heterogeneity than other omics technologies due to the much wider range of molecular entities detected and measured. This review introduces the fundamental concepts of error analysis as they apply to a wide range of metabolomics experimental designs and it discusses current methodologies for determining the propagation of uncertainty in appropriate metabolomics data analysis. These methodologies include analytical derivation and approximation techniques, Monte Carlo error analysis, and error analysis in metabolic inverse problems. Current limitations of each methodology with respect to metabolomics data analysis are also discussed
Finding High-Quality Metal Ion-Centric Regions Across the Worldwide Protein Data Bank
As the number of macromolecular structures in the worldwide Protein Data Bank (wwPDB) continues to grow rapidly, more attention is being paid to the quality of its data, especially for use in aggregated structural and dynamics analyses. In this study, we systematically analyzed 3.5 Å regions around all metal ions across all PDB entries with supporting electron density maps available from the PDB in Europe. All resulting metal ion-centric regions were evaluated with respect to four quality-control criteria involving electron density resolution, atom occupancy, symmetry atom exclusion, and regional electron density discrepancy. The resulting list of metal binding sites passing all four criteria possess high regional structural quality and should be beneficial to a wide variety of downstream analyses. This study demonstrates an approach for the pan-PDB evaluation of metal binding site structural quality with respect to underlying X-ray crystallographic experimental data represented in the available electron density maps of proteins. For non-crystallographers in particular, we hope to change the focus and discussion of structural quality from a global evaluation to a regional evaluation, since all structural entries in the wwPDB appear to have both regions of high and low structural quality
Robust Moiety Model Selection Using Mass Spectrometry Measured Isotopologues
Stable isotope resolved metabolomics (SIRM) experiments use stable isotope tracers to provide superior metabolomics datasets for metabolic flux analysis and metabolic modeling. Since assumptions of model correctness can seriously compromise interpretation of metabolic flux results, we have developed a metabolic modeling software package specifically designed for moiety model comparison and selection based on the metabolomics data provided. Here, we tested the effectiveness of model selection with two time-series mass spectrometry (MS) isotopologue datasets for uridine diphosphate N-acetyl-d-glucosamine (UDP-GlcNAc) generated from different platforms utilizing direct infusion nanoelectrospray and liquid chromatography. Analysis results demonstrate the robustness of our model selection methods by the successful selection of the optimal model from over 40 models provided. Moreover, the effects of specific optimization methods, degree of optimization, selection criteria, and specific objective functions on model selection are illustrated. Overall, these results indicate that over-optimization can lead to model selection failure, but combining multiple datasets can help control this overfitting effect. The implication is that SIRM datasets in public repositories of reasonable quality can be combined with newly acquired datasets to improve model selection. Furthermore, curation efforts of public metabolomics repositories to maintain high data quality could have a huge impact on future metabolic modeling efforts
Development and In silico Evaluation of Large-Scale Metabolite Identification Methods using Functional Group Detection for Metabolomics
Large-scale identification of metabolites is key to elucidating and modeling metabolism at the systems level. Advances in metabolomics technologies, particularly ultra-high resolution mass spectrometry enable comprehensive and rapid analysis of metabolites. However, a significant barrier to meaningful data interpretation is the identification of a wide range of metabolites including unknowns and the determination of their role(s) in various metabolic networks. Chemoselective (CS) probes to tag metabolite functional groups combined with high mass accuracy provide additional structural constraints for metabolite identification and quantification. We have developed a novel algorithm, Chemically Aware Substructure Search (CASS) that efficiently detects functional groups within existing metabolite databases, allowing for combined molecular formula and functional group (from CS tagging) queries to aid in metabolite identification without a priori knowledge. Analysis of the isomeric compounds in both Human Metabolome Database (HMDB) and KEGG Ligand demonstrated a high percentage of isomeric molecular formulae (43% and 28% respectively), indicating the necessity for techniques such as CS-tagging. Furthermore, these two databases have only moderate overlap in molecular formulae. Thus, it is prudent to use multiple databases in metabolite assignment, since each major metabolite database represents different portions of metabolism within the biosphere. In silico analysis of various CS-tagging strategies under different conditions for adduct formation demonstrate that combined FT-MS derived molecular formulae and CS-tagging can uniquely identify up to 71% of KEGG and 37% of the combined KEGG/HMDB database versus 41% and 17% respectively without adduct formation. This difference between database isomer disambiguation highlights the strength of CS-tagging for non-lipid metabolite identification. However, unique identification of complex lipids still needs additional information
InChI Isotopologue and Isotopomer Specifications
Abstract
This work presents a proposed extension to the International Union of Pure and Applied Chemistry (IUPAC) International Chemical Identifier (InChI) standard that allows the representation of isotopically-resolved chemical entities at varying levels of ambiguity in isotope location. This extension includes an improved interpretation of the current isotopic layer within the InChI standard and a new isotopologue layer specification for representing chemical intensities with ambiguous isotope localization. Both improvements support the unique isotopically-resolved chemical identification of features detected and measured in analytical instrumentation, specifically nuclear magnetic resonance and mass spectrometry. This new extension to the InChI standard would enable improved annotation of analytical datasets characterizing chemical entities, supporting the FAIR (Findable, Accessible, Interoperable, and Reusable) guiding principles of data stewardship for chemical datasets, ultimately promoting Open Science in chemistry
A FAIR approach for detecting and sharing PFAS hot-spot areas and water systems
Per- and polyfluoroalkyl substances (PFAS) contamination in water sources near potential PFAS users is well known. Therefore, it is useful for PFAS stakeholders to visualize hot-spot areas and bring attention to the water systems that are near to those areas. Towards this end, we extracted information about PFAS sources, drinking water information, sewer water information, and Source Water Assessment Protection Program (SWAPP) information from publicly available sources to create five different maps in ArcGIS Online that highlight PFAS contamination in relation to potential PFAS users. Following the FAIR (Findable, Accessible, Interoperable and Reusable) principles, we created a Figshare repository that includes all data and associated metadata with these five ArcGIS maps. Moreover, the Figshare repository includes a metadata description of the maps in JSON format that adheres to a draft Minimum Information about Geospatial Information System (MIAGIS) standard we have developed. We hope this MIAGIS draft will assist in establishing a GIS standards group that will develop the draft into a full standard for the wider GIS community. We have also developed a miagis Python package that facilitates the generation of a MIAGIS-compliant JSON metadata file