67 research outputs found
Development of Database Assisted Structure Identification (DASI) Methods for Nontargeted Metabolomics
Metabolite structure identification remains a significant challenge in nontargeted metabolomics research. One commonly used strategy relies on searching biochemical databases using exact mass. However, this approach fails when the database does not contain the unknown metabolite (i.e., for unknown-unknowns). For these cases, constrained structure generation with combinatorial structure generators provides a potential option. Here we evaluated structure generation constraints based on the specification of: (1) substructures required (i.e., seed structures); (2) substructures not allowed; and (3) filters to remove incorrect structures. Our approach (database assisted structure identification, DASI) used predictive models in MolFind to find candidate structures with chemical and physical properties similar to the unknown. These candidates were then used for seed structure generation using eight different structure generation algorithms. One algorithm was able to generate correct seed structures for 21/39 test compounds. Eleven of these seed structures were large enough to constrain the combinatorial structure generator to fewer than 100,000 structures. In 35/39 cases, at least one algorithm was able to generate a correct seed structure. The DASI method has several limitations and will require further experimental validation and optimization. At present, it seems most useful for identifying the structure of unknown-unknowns with molecular weights <200 Da
Evaluating Patients' Perspective on Metoclopramide with Text Mining
Pharmacovigilance attempts to detect, assess, understand, and prevent adverse effects or any other possible drug-related problems. All pharmacovigilance systems in existence today rely on voluntary reporting. The effectiveness of these systems are limited due to under reporting and reporting bias. A vast amount of patient generated data on possible adverse effects can be found on health related web forums and social media. It is possible to use these user generated data to augment traditional pharmacovigilance systems. The purpose of this study is to examine the usefulness of such data found on health related web forums by evaluating patientsâ perspective on metoclopramide.Data was obtained from two popular health related forums Drugs.com and WebMD.com. Web scraping was used to obtain the necessary data in tabulated form.According to patientsâ reports on Drugs.com, the most common uses of metoclopramide were for "migraine" and "nausea," while the least frequently reported usage was for "GERD,". Most frequently reported side effects included "anxiety" and "headache". Fatigue and akathisia were the adverse effects least frequently mentioned. According to patient perspectives from WebMD.com, the most reported indications for metoclopramide were "nausea" and "vomit", while the least reported indication was for "migraine". According to WebMD data, the most frequently reported adverse effects of metoclopramide were "spasm," "cough," "bloat" and "drowsiness" while the least reported adverse effect was "Parkinsonâs"
Evaluating Patients' Perspective on Metoclopramide with Text Mining
Pharmacovigilance attempts to detect, assess, understand, and prevent adverse effects or any other possible drug-related problems. All pharmacovigilance systems in existence today rely on voluntary reporting. The effectiveness of these systems are limited due to under reporting and reporting bias. A vast amount of patient generated data on possible adverse effects can be found on health related web forums and social media. It is possible to use these user generated data to augment traditional pharmacovigilance systems. The purpose of this study is to examine the usefulness of such data found on health related web forums by evaluating patientsâ perspective on metoclopramide. Data was obtained from two popular health related forums Drugs.com and WebMD.com. Web scraping was used to obtain the necessary data in tabulated form.THIS DATASET IS ARCHIVED AT DANS/EASY, BUT NOT ACCESSIBLE HERE. TO VIEW A LIST OF FILES AND ACCESS THE FILES IN THIS DATASET CLICK ON THE DOI-LINK ABOV
Development of Database Assisted Structure Identification (DASI) Methods for Nontargeted Metabolomics
Metabolite structure identification remains a significant challenge in nontargeted metabolomics research. One commonly used strategy relies on searching biochemical databases using exact mass. However, this approach fails when the database does not contain the unknown metabolite (i.e., for unknown-unknowns). For these cases, constrained structure generation with combinatorial structure generators provides a potential option. Here we evaluated structure generation constraints based on the specification of: (1) substructures required (i.e., seed structures); (2) substructures not allowed; and (3) filters to remove incorrect structures. Our approach (database assisted structure identification, DASI) used predictive models in MolFind to find candidate structures with chemical and physical properties similar to the unknown. These candidates were then used for seed structure generation using eight different structure generation algorithms. One algorithm was able to generate correct seed structures for 21/39 test compounds. Eleven of these seed structures were large enough to constrain the combinatorial structure generator to fewer than 100,000 structures. In 35/39 cases, at least one algorithm was able to generate a correct seed structure. The DASI method has several limitations and will require further experimental validation and optimization. At present, it seems most useful for identifying the structure of unknown-unknowns with molecular weights <200 Da
Chemical Structure Identification in Metabolomics: Computational Modeling of Experimental Features
The identification of compounds in complex mixtures remains challenging despite recent advances in analytical techniques. At present, no single method can detect and quantify the vast array of compounds that might be of potential interest in metabolomics studies. High performance liquid chromatography/mass spectrometry (HPLC/MS) is often considered the analytical method of choice for analysis of biofluids. The positive identification of an unknown involves matching at least two orthogonal HPLC/MS measurements (exact mass, retention index, drift time etc.) against an authentic standard. However, due to the limited availability of authentic standards, an alternative approach involves matching known and measured features of the unknown compound with computationally predicted features for a set of candidate compounds downloaded from a chemical database. Computationally predicted features include retention index, ECOM50 (energy required to decompose 50% of a selected precursor ion in a collision induced dissociation cell), drift time, whether the unknown compound is biological or synthetic and a collision induced dissociation (CID) spectrum. Computational predictions are used to filter the initial âbinâ of candidate compounds. The final output is a ranked list of candidates that best match the known and measured features. In this mini review, we discuss cheminformatics methods underlying this database search-filter identification approach
In Silico Enzymatic Synthesis of a 400â000 Compound Biochemical Database for Nontargeted Metabolomics
Current
methods of structure identification in mass-spectrometry-based
nontargeted metabolomics rely on matching experimentally determined
features of an unknown compound to those of candidate compounds contained
in biochemical databases. A major limitation of this approach is the
relatively small number of compounds currently included in these databases.
If the correct structure is not present in a database, it cannot be
identified, and if it cannot be identified, it cannot be included
in a database. Thus, there is an urgent need to augment metabolomics
databases with rationally designed biochemical structures using alternative
means. Here we present the In Vivo/In Silico Metabolites Database
(IIMDB), a database of in silico enzymatically synthesized metabolites,
to partially address this problem. The database, which is available
at http://metabolomics.pharm.uconn.edu/iimdb/, includes âŒ23â000 known compounds (mammalian metabolites,
drugs, secondary plant metabolites, and glycerophospholipids) collected
from existing biochemical databases plus more than 400â000
computationally generated human phase-I and phase-II metabolites of
these known compounds. IIMDB features a user-friendly web interface
and a programmer-friendly RESTful web service. Ninety-five percent
of the computationally generated metabolites in IIMDB were not found
in any existing database. However, 21â640 were identical to
compounds already listed in PubChem, HMDB, KEGG, or HumanCyc. Furthermore,
the vast majority of these in silico metabolites were scored as biological
using BioSM, a software program that identifies biochemical structures
in chemical structure space. These results suggest that in silico
biochemical synthesis represents a viable approach for significantly
augmenting biochemical databases for nontargeted metabolomics applications
- âŠ