151 research outputs found

    Challenges for Chemoinformatics Education in Drug Discovery

    Get PDF
    Surveys the curriculum developed at Indiana University for teaching cheminformatics in the IU School of Informatic

    Open Babel: An open chemical toolbox

    Get PDF
    Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendorneutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license fro

    Toward a Standardized Strategy of Clinical Metabolomics for the Advancement of Precision Medicine

    Get PDF
    Despite the tremendous success, pitfalls have been observed in every step of a clinical metabolomics workflow, which impedes the internal validity of the study. Furthermore, the demand for logistics, instrumentations, and computational resources for metabolic phenotyping studies has far exceeded our expectations. In this conceptual review, we will cover inclusive barriers of a metabolomics-based clinical study and suggest potential solutions in the hope of enhancing study robustness, usability, and transferability. The importance of quality assurance and quality control procedures is discussed, followed by a practical rule containing five phases, including two additional "pre-pre-" and "post-post-" analytical steps. Besides, we will elucidate the potential involvement of machine learning and demonstrate that the need for automated data mining algorithms to improve the quality of future research is undeniable. Consequently, we propose a comprehensive metabolomics framework, along with an appropriate checklist refined from current guidelines and our previously published assessment, in the attempt to accurately translate achievements in metabolomics into clinical and epidemiological research. Furthermore, the integration of multifaceted multi-omics approaches with metabolomics as the pillar member is in urgent need. When combining with other social or nutritional factors, we can gather complete omics profiles for a particular disease. Our discussion reflects the current obstacles and potential solutions toward the progressing trend of utilizing metabolomics in clinical research to create the next-generation healthcare system.11Ysciescopu

    DEVELOPMENT OF TOOLS FOR ATOM-LEVEL INTERPRETATION OF STABLE ISOTOPE-RESOLVED METABOLOMICS DATASETS

    Get PDF
    Metabolomics is the global study of small molecules in living systems under a given state, merging as a new ‘omics’ study in systems biology. It has shown great promise in elucidating biological mechanism in various areas. Many diseases, especially cancers, are closely linked to reprogrammed metabolism. As the end point of biological processes, metabolic profiles are more representative of the biological phenotype compared to genomic or proteomic profiles. Therefore, characterizing metabolic phenotype of various diseases will help clarify the metabolic mechanisms and promote the development of novel and effective treatment strategies. Advances in analytical technologies such as nuclear magnetic resonance and mass spectroscopy greatly contribute to the detection and characterization of global metabolites in a biological system. Furthermore, application of these analytical tools to stable isotope resolved metabolomics experiments can generate large-scale high-quality metabolomics data containing isotopic flow through cellular metabolism. However, the lack of the corresponding computational analysis tools hinders the characterization of metabolic phenotypes and the downstream applications. Both detailed metabolic modeling and quantitative analysis are required for proper interpretation of these complex metabolomics data. For metabolic modeling, currently there is no comprehensive metabolic network at an atom-resolved level that can be used for deriving context-specific metabolic models for SIRM metabolomics datasets. For quantitative analysis, most available tools conduct metabolic flux analysis based on a well-defined metabolic model, which is hard to achieve for complex biological system due to the limitations in our knowledge. Here, we developed a set of methods to address these problems. First, we developed a neighborhood-specific coloring method that can create identifier for each atom in a specific compound. With the atom identifiers, we successfully harmonized compounds and reactions across KEGG and MetaCyc databases at various levels. In addition, we evaluated the atom mappings of the harmonized metabolic reactions. These results will contribute to the construction of a comprehensive atom-resolved metabolic network. In addition, this method can be easily applied to any metabolic database that provides a molfile representation of compounds, which will greatly facilitate future expansion. In addition, we developed a moiety modeling framework to deconvolute metabolite isotopologue profiles using moiety models along with the analysis and selection of the best moiety model(s) based on the experimental data. To our knowledge, this is the first method that can analyze datasets involving multiple isotope tracers. Furthermore, instead of a single predefined metabolic model, this method allows the comparison of multiple metabolic models derived from a given metabolic profile, and we have demonstrated the robust performance of the moiety modeling framework in model selection with a 13C-labeled UDP-GlcNAc isotopologue dataset. We further explored the data quality requirements and the factors that affect model selection. Collectively, these methods and tools help interpret SIRM metabolomics datasets from metabolic modeling to quantitative analysis

    Development of quantitative structure property relationships to support non-target LC-HRMS screening

    Get PDF
    Κατά την τελευταία δεκαετία, ένας μεγάλος αριθμός αναδυόμενων ρύπων έχουν ανιχνευθεί και ταυτοποιηθεί σε επιφανειακά ύδατα και λύματα, προκαλώντας ανησυχία για το υδάτινο οικοσύστημα, λόγω της πιθανής χημικής τους σταθερότητας. Η τεχνική της υγροχρωματογραφίας - φασματομετρίας μάζας υψηλής διακριτικής ικανότητας (LC-HRMS) αποτελεί μια αποτελεσματική τεχνική για την ανίχνευση αναδυόμενων ρύπων στο περιβάλλον. Η ταυτόχρονη δε ανάλυση των δειγμάτων με τις συμπληρωματικές τεχνικές της υγροχρωματογραφίας αντίστροφης φάσης (RPLC) και της υγροχρωματογραφίας υδρόφιλων αλληλεπιδράσεων (HILIC), συντελεί στην ταυτοποίηση «ύποπτων» ή και άγνωστων ρύπων με ποικίλες φυσικοχημικές ιδιότητες. Για την ταυτοποίηση τους, απαιτείται να πληρούνται συγκεκριμένα κριτήρια, τα οποία αξιολογούνται με βάση τη χρήση διαγνωστικών εργαλείων, όπως η ακριβής πρόβλεψη του χρόνου ανάσχεσης, η in silico θραυσματοποίηση και η πρόβλεψη της συμπεριφορά τους στον ιοντισμό. Στο 3ο κεφάλαιο της παρούσας διδακτορικής διατριβής περιγράφεται η ανάπτυξη μιας ολοκληρωμένης πορείας εργασίας (workflow) για τη διερεύνηση των παραμέτρων που επηρεάζουν τον χρόνο έκλουσης μεγάλου αριθμού ενώσεων που συγκαταλέγονται στους αναδυόμενους ρύπους. Για τον σκοπό αυτό, πάνω από 2.500 αναδυόμενοι ρύποι χρησιμοποιήθηκαν για την ανάπτυξη του μοντέλου πρόβλεψης χρόνου ανάσχεσης για τις 2 υγροχρωματογραφικές τεχνικές (RP- και HILIC-LC-HRMS) και για ηλεκτροψεκασμό τόσο σε θετικό όσο και σε αρνητικό ιοντισμό (+/-ESI). Στη συνέχεια, πραγματοποιήθηκε εφαρμογή του μοντέλου για την υπολογιστική πρόβλεψη του χρόνου ανάσχεσης, για την ταυτοποίηση 10 νέων προϊόντων μετασχματισμού των φαρμακευτικών ενώσεων (tramadol, furosemide και niflumic acid) ύστερα από επεξεργασία με όζον. Στο 4ο κεφάλαιο παρουσιάζεται η ανάπτυξη ενός καινοτόμου γενικευμένου χημειομετρικού μοντέλου το οποίο είναι ικανό να προβλέπει τον χρόνο έκλουσης κάθε πιθανού ρύπου, ανεξαρτήτου υγροχρωματογραφικής μεθόδου που χρησιμοποιείται, συμβάλλοντας σημαντικά στην σύγκριση αποτελεσμάτων από διαφορετικές LC-HRMS μεθόδους. Το συγκεκριμένο μοντέλο χρησιμοποιήθηκε για την ταυτοποίηση «ύποπτων» και άγνωστων ενώσεων σε διεργαστηριακές δοκιμές. Το Κεφάλαιο 5, περιέχει την περιγραφή της ανάπτυξης ενός υπολογιστικού μοντέλου πρόβλεψης τοξικότητας αναδυόμενων ρύπων που ανιχνεύονται στο υδάτινο οικοσύστημα. Το συγκεκριμένο μοντέλο αποσκοπεί στην εκτίμηση του πιθανού περιβαλλοντικού κινδύνου για νέες ενώσεις που ταυτοποιήθηκαν μέσω σάρωσης «ύποπτων» ενώσεων και μη-στοχευμένης σάρωσης, για τις οποίες δεν είναι ακόμα διαθέσιμα πειραματικά δεδομένα τοξικότητας. Τέλος, στο κεφάλαιο 6 παρουσιάζεται ένας αυτοματοποιημένος και συστηματικός τρόπος σάρωσης «ύποπτων» ενώσεων και μη-στοχευμένης σάρωσης σε δεδομένα από LC-HRMS. Η νέα αυτή αυτοματοποιημένη πορεία εργασίας, αποσκοπεί στην λιγότερο χρονοβόρα επεξεργασία των HRMS δεδομένων, και στην εφαρμογή της μη-στοχευμένης σάρωσης ώστε να είναι δυνατή η εφαρμογή τους σε καθημερινούς ελέγχους ρουτίνας ή/και για χρήση από τις κανονιστικές αρχές.Over the last decade, a high number of emerging contaminants were detected and identified in surface and waste waters that could threaten the aquatic environment due to their pseudo-persistence. As it is described in chapters 1 and 2, liquid chromatography high resolution mass spectroscopy (LC-HRMS) can be used as an efficient tool for their screening. Simultaneously screening of these samples by hydrophilic interaction liquid chromatography (HILIC) and reversed phase (RP) would help with full identification of suspects and unknown compounds. However, to confirm the identity of the most relevant suspect or unknown compounds, their chemical properties such as retention time behavior, MSn fragmentation and ionization modes should be investigated. Chapter 3 of this thesis discusses the development of a comprehensive workflow to study the retention time behavior of large groups of compounds belonging to emerging contaminants. A dataset consisted of more than 2500 compounds was used for RP/HILIC-LC-HRMS, and their retention times were derived in both Electrospray Ionization mode (+/-ESI). These in silico approaches were then applied on the identification of 10 new transformation products of tramadol, furosemide and niflumic acid (under ozonation treatment). Chapter 4 discusses about the development of a first retention time index system for LC-HRMS. Some practical applications of this RTI system in suspect and non-target screening in collaborative trials have been presented as well. Chapter 5 describes the development of in silico based toxicity models to estimate the acute toxicity of emerging pollutants in the aquatic environment. This would help link the suspect/non-target screening results to the tentative environmental risk by predicting the toxicity of newly tentatively identified compounds. Chapter 6 introduces an automatic and systematic way to perform suspect and non-target screening in LC-HRMS data. This would save time and the data analysis loads and enable the routine application of non-target screening for regulatory or monitoring purpose

    The CHEMDNER corpus of chemicals and drugs and its annotation principles

    Get PDF
    The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus
    corecore