103,091 research outputs found
Self-organizing ontology of biochemically relevant small molecules
<p>Abstract</p> <p>Background</p> <p>The advent of high-throughput experimentation in biochemistry has led to the generation of vast amounts of chemical data, necessitating the development of novel analysis, characterization, and cataloguing techniques and tools. Recently, a movement to publically release such data has advanced biochemical structure-activity relationship research, while providing new challenges, the biggest being the curation, annotation, and classification of this information to facilitate useful biochemical pattern analysis. Unfortunately, the human resources currently employed by the organizations supporting these efforts (e.g. ChEBI) are expanding linearly, while new useful scientific information is being released in a seemingly exponential fashion. Compounding this, currently existing chemical classification and annotation systems are not amenable to automated classification, formal and transparent chemical class definition axiomatization, facile class redefinition, or novel class integration, thus further limiting chemical ontology growth by necessitating human involvement in curation. Clearly, there is a need for the automation of this process, especially for novel chemical entities of biological interest.</p> <p>Results</p> <p>To address this, we present a formal framework based on Semantic Web technologies for the automatic design of chemical ontology which can be used for automated classification of novel entities. We demonstrate the automatic self-assembly of a structure-based chemical ontology based on 60 MeSH and 40 ChEBI chemical classes. This ontology is then used to classify 200 compounds with an accuracy of 92.7%. We extend these structure-based classes with molecular feature information and demonstrate the utility of our framework for classification of functionally relevant chemicals. Finally, we discuss an iterative approach that we envision for future biochemical ontology development.</p> <p>Conclusions</p> <p>We conclude that the proposed methodology can ease the burden of chemical data annotators and dramatically increase their productivity. We anticipate that the use of formal logic in our proposed framework will make chemical classification criteria more transparent to humans and machines alike and will thus facilitate predictive and integrative bioactivity model development.</p
Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.
The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included
Freeze-drying modeling and monitoring using a new neuro-evolutive technique
This paper is focused on the design of a black-box model for the process of freeze-drying of pharmaceuticals. A new methodology based on a self-adaptive differential evolution scheme is combined with a back-propagation algorithm, as local search method, for the simultaneous structural and parametric optimization of the model represented by a neural network. Using the model of the freeze-drying process, both the temperature and the residual ice content in the product vs. time can be determine off-line, given the values of the operating conditions (the temperature of the heating shelf and the pressure in the drying chamber). This makes possible to understand if the maximum temperature allowed by the product is trespassed and when the sublimation drying is complete, thus providing a valuable tool for recipe design and optimization. Besides, the black box model can be applied to monitor the freeze-drying process: in this case, the measurement of product temperature is used as input variable of the neural network in order to provide in-line estimation of the state of the product (temperature and residual amount of ice). Various examples are presented and discussed, thus pointing out the strength of the too
On Weight Matrix and Free Energy Models for Sequence Motif Detection
The problem of motif detection can be formulated as the construction of a
discriminant function to separate sequences of a specific pattern from
background. In computational biology, motif detection is used to predict DNA
binding sites of a transcription factor (TF), mostly based on the weight matrix
(WM) model or the Gibbs free energy (FE) model. However, despite the wide
applications, theoretical analysis of these two models and their predictions is
still lacking. We derive asymptotic error rates of prediction procedures based
on these models under different data generation assumptions. This allows a
theoretical comparison between the WM-based and the FE-based predictions in
terms of asymptotic efficiency. Applications of the theoretical results are
demonstrated with empirical studies on ChIP-seq data and protein binding
microarray data. We find that, irrespective of underlying data generation
mechanisms, the FE approach shows higher or comparable predictive power
relative to the WM approach when the number of observed binding sites used for
constructing a discriminant decision is not too small.Comment: 23 pages, 1 figure and 4 table
Machine Learning of Molecular Electronic Properties in Chemical Compound Space
The combination of modern scientific computing with electronic structure
theory can lead to an unprecedented amount of data amenable to intelligent data
analysis for the identification of meaningful, novel, and predictive
structure-property relationships. Such relationships enable high-throughput
screening for relevant properties in an exponentially growing pool of virtual
compounds that are synthetically accessible. Here, we present a machine
learning (ML) model, trained on a data base of \textit{ab initio} calculation
results for thousands of organic molecules, that simultaneously predicts
multiple electronic ground- and excited-state properties. The properties
include atomization energy, polarizability, frontier orbital eigenvalues,
ionization potential, electron affinity, and excitation energies. The ML model
is based on a deep multi-task artificial neural network, exploiting underlying
correlations between various molecular properties. The input is identical to
\emph{ab initio} methods, \emph{i.e.} nuclear charges and Cartesian coordinates
of all atoms. For small organic molecules the accuracy of such a "Quantum
Machine" is similar, and sometimes superior, to modern quantum-chemical
methods---at negligible computational cost
- âŚ