396 research outputs found

    GO faster ChEBI with Reasonable Biochemistry

    Get PDF
    Chemical Entities of Biological Interest (ChEBI) is a database and ontology that represents biochemical knowledge about small molecules. Recent changes to the ontology have created new opportunities for automated reasoning with description logic, that have not previously been fully exploited in Chemistry. These changes open up the possibility of building an improved chemical semantic web, by making more use of necessary and sufficient conditions, allowing reasoning about chemical structure, highlighting ambiguous inconsistencies and improving alignment with the Gene Ontology (GO). This paper briefly discusses some of the problems with reasoning over the current version of ChEBI, to tackle these issues, and their potential solutions

    Using Neural Networks for Relation Extraction from Biomedical Literature

    Full text link
    Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations between concepts in biomedical literature, namely, using neural networks algorithms. The use of multichannel architectures composed of multiple data representations, as in deep neural networks, is leading to state-of-the-art results. The right combination of data representations can eventually lead us to even higher evaluation scores in relation extraction tasks. Thus, biomedical ontologies play a fundamental role by providing semantic and ancestry information about an entity. The incorporation of biomedical ontologies has already been proved to enhance previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1

    The Infectious Disease Ontology in the Age of COVID-19

    Get PDF
    The Infectious Disease Ontology (IDO) is a suite of interoperable ontology modules that aims to provide coverage of all aspects of the infectious disease domain, including biomedical research, clinical care, and public health. IDO Core is designed to be a disease and pathogen neutral ontology, covering just those types of entities and relations that are relevant to infectious diseases generally. IDO Core is then extended by a collection of ontology modules focusing on specific diseases and pathogens. In this paper we present applications of IDO Core within various areas of infectious disease research, together with an overview of all IDO extension ontologies and the methodology on the basis of which they are built. We also survey recent developments involving IDO, including the creation of IDO Virus; the Coronaviruses Infectious Disease Ontology (CIDO); and an extension of CIDO focused on COVID-19 (IDO-CovID-19).We also discuss how these ontologies might assist in information-driven efforts to deal with the ongoing COVID-19 pandemic, to accelerate data discovery in the early stages of future pandemics, and to promote reproducibility of infectious disease research

    Chemical Entities of Biological Interest: an update

    Get PDF
    Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. The molecular entities in question are either natural products or synthetic products used to intervene in the processes of living organisms. Genome-encoded macromolecules (nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI. In addition to molecular entities, ChEBI contains groups (parts of molecular entities) and classes of entities. ChEBI includes an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified. ChEBI is available online at http://www.ebi.ac.uk/chebi/. This article reports on new features in ChEBI since the last NAR report in 2007, including substructure and similarity searching, a submission tool for authoring of ChEBI datasets by the community and a 30-fold increase in the number of chemical structures stored in ChEBI

    Structural and semantic similarity metrics for chemical compound classification

    Get PDF
    Tese de mestrado, Bioquímica, Universidade de Lisboa, Faculdade de Ciências, 2010Ao longo das últimas décadas, tem-se assistido a um grande aumento na quantidade de dados produzidos e disponibilizados em química, em especial após a introdução de métodos de análise mecanizados. Devido a este crescimento no número de dados, existe cada vez mais uma necessidade de implementar sistemas automáticos computacionais capazes de armazenar, estudar e interpretar estes dados de forma eficiente. Uma das tarefas mais importantes em quimio-informática é, de facto, a utilização dos dados obtidos em laboratório em sistemas de comparação e classificação de compostos químicos. Os métodos actuais mais eficazes baseiam-se na premissa de que a função de um composto químico está intimamente relacionada com a sua estrutura. Apesar de esta premissa estar geralmente correcta, como comprovam os métodos actuais, eles podem falhar, especialmente quando moléculas parecidas desempenham funções diferentes (como acontece com os l- e d-aminoácidos) ou moléculas diferentes desempenham uma função biológica semelhante (como acontece com inúmeros exemplos de inibidores). O trabalho proposto neste documento apresenta uma solução para resolver este problema através da utilização de uma métrica híbrida que integre no seu núcleo informação não só estrutural mas também semântica, ou seja, o sistema desenvolvido tem a capacidade de explorar a informação acerca do significado das moléculas num contexto bioquímico. Para este efeito, utilizei o ChEBI como fonte de informação semântica, tendo criado uma ferramenta denominada Chym (Chemical Hybrid Metric) que é capaz de lidar com problemas de classificação de compostos químicos. Resumidamente, para decidir se um composto químico possui uma determinada característica, por exemplo se atravessa a barreira hematoencefálica, este sistema atribui ao composto um coeficiente de actividade que é calculado com base nos compostos químicos que se sabe possuírem a característica; por comparação com um valor de corte, o Chym classifica o composto em estudo como possuidor ou não dessa característica. A ferramenta que resultou do trabalho desta tese foi aqui explorada e validada. Assim, o trabalho apresentado mostra evidências substanciais que suportam a eficácia do Chym, uma vez que este apresenta melhores resultados do que todos os modelos com os quais foi comparado. Particularmente, para três problemas seleccionados, o Chym decide correctamente qual a classificação de um composto 90.9%, 87.7% e 84.2% das vezes: pela ordem apresentada, esses valores referem-se à classificação de compostos como permeáveis à barreira hematoencefálica, como substratos da glicoproteína-P, ou como ligandos de um receptor de estrogénio. Para efeitos de comparação, estes três problemas foram anteriormente resolvidos com exactidão de 81.5%, 80.6% e 82.8% respectivamente. Comprova-se, portanto, a hipótese da tese, ou seja, que a integração de informação semântica em sistemas de comparação e classificação de compostos químicos aumenta, por vezes de forma substancial, a fidelidade do método. Desta forma, o objectivo da tese foi bem sucedido em duas frentes. Por um lado a tese serviu para validar a hipótese, e por outro culminou na criação de uma ferramenta de classificação de compostos químicos que pode vir a ser usada no futuro em projectos mais abrangentes, nomeadamente no estudo da evolução das vias metabólicas, na área de desenvolvimento de fármacos ou na análise preliminar da toxicidade de compostos químicos.Over the last few decades, there has been an increasing number of attempts at creating systems capable of comparing and classifying chemical compounds based on their structure and/or physicochemical properties. While the rate of success of these approaches has been increasing, particularly with the introduction of new and ever more sophisticated methods of machine learning, there is still room for improvement. One of the problems of these methods is that they fail to consider that similar molecules may have different roles in nature, or, to a lesser extend, that disparate molecules may have similar roles. This thesis proposes the exploitation of the semantic properties of chemical compounds, as described in the ChEBI ontology, to create an efficient system able to automatically deal with the binary classification of chemical compounds. To that effect, I developed Chym (Chemical Hybrid Metric) as a tool that integrates structural and semantic information in a unique hybrid metric. The work here presented shows substantial evidence supporting the effectiveness of Chym, since it has outperformed all the models with which it was compared. Particularly, it achieved accuracy values of 90.9%, 87.7% and 84.2% when solving three classification problems which, previously, had only been solved with accuracy values of 81.5%, 80.6% and 82.8% respectively. Other results show that the tool is appropriate to use even if the problem at hand is not well represented in the ChEBI ontology. Thus, Chym shows that considering the semantic properties of a compound helps solving classification problems. Therefore, Chym can be used in projects that require the classification and/or the comparison of chemical compounds, such as the study of the evolution of metabolic pathways, drug discovery or in preliminary toxicity analysis

    A rule-based ontological framework for the classification of molecules

    Full text link

    A rule-based ontological framework for the classification of molecules

    Get PDF
    BACKGROUND: A variety of key activities within life sciences research involves integrating and intelligently managing large amounts of biochemical information. Semantic technologies provide an intuitive way to organise and sift through these rapidly growing datasets via the design and maintenance of ontology-supported knowledge bases. To this end, OWL-a W3C standard declarative language- has been extensively used in the deployment of biochemical ontologies that can be conveniently organised using the classification facilities of OWL-based tools. One of the most established ontologies for the chemical domain is ChEBI, an open-access dictionary of molecular entities that supplies high quality annotation and taxonomical information for biologically relevant compounds. However, ChEBI is being manually expanded which hinders its potential to grow due to the limited availability of human resources. RESULTS: In this work, we describe a prototype that performs automatic classification of chemical compounds. The software we present implements a sound and complete reasoning procedure of a formalism that extends datalog and builds upon an off-the-shelf deductive database system. We capture a wide range of chemical classes that are not expressible with OWL-based formalisms such as cyclic molecules, saturated molecules and alkanes. Furthermore, we describe a surface 'less-logician-like' syntax that allows application experts to create ontological descriptions of complex biochemical objects without prior knowledge of logic. In terms of performance, a noticeable improvement is observed in comparison with previous approaches. Our evaluation has discovered subsumptions that are missing from the manually curated ChEBI ontology as well as discrepancies with respect to existing subclass relations. We illustrate thus the potential of an ontology language suitable for the life sciences domain that exhibits a favourable balance between expressive power and practical feasibility. CONCLUSIONS: Our proposed methodology can form the basis of an ontology-mediated application to assist biocurators in the production of complete and error-free taxonomies. Moreover, such a tool could contribute to a more rapid development of the ChEBI ontology and to the efforts of the ChEBI team to make annotated chemical datasets available to the public. From a modelling point of view, our approach could stimulate the adoption of a different and expressive reasoning paradigm based on rules for which state-of-the-art and highly optimised reasoners are available; it could thus pave the way for the representation of a broader spectrum of life sciences and biomedical knowledge.</p

    Evaluation and cross-comparison of lexical entities of biological interest (LexEBI)

    Get PDF
    MOTIVATION: Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical "term space" (the "Lexeome"), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). RESULT: This study compiles a resource for lexical terms of biomedical interest in a standard format (called "LexEBI"), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. CONCLUSION: LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources

    When one Logic is Not Enough: Integrating First-order Annotations in OWL Ontologies

    Full text link
    In ontology development, there is a gap between domain ontologies which mostly use the web ontology language, OWL, and foundational ontologies written in first-order logic, FOL. To bridge this gap, we present Gavel, a tool that supports the development of heterogeneous 'FOWL' ontologies that extend OWL with FOL annotations, and is able to reason over the combined set of axioms. Since FOL annotations are stored in OWL annotations, FOWL ontologies remain compatible with the existing OWL infrastructure. We show that for the OWL domain ontology OBI, the stronger integration with its FOL top-level ontology BFO via our approach enables us to detect several inconsistencies. Furthermore, existing OWL ontologies can benefit from FOL annotations. We illustrate this with FOWL ontologies containing mereotopological axioms that enable new meaningful inferences. Finally, we show that even for large domain ontologies such as ChEBI, automatic reasoning with FOL annotations can be used to detect previously unnoticed errors in the classification

    Semantic Similarity for Automatic Classification of Chemical Compounds

    Get PDF
    With the increasing amount of data made available in the chemical field, there is a strong need for systems capable of comparing and classifying chemical compounds in an efficient and effective way. The best approaches existing today are based on the structure-activity relationship premise, which states that biological activity of a molecule is strongly related to its structural or physicochemical properties. This work presents a novel approach to the automatic classification of chemical compounds by integrating semantic similarity with existing structural comparison methods. Our approach was assessed based on the Matthews Correlation Coefficient for the prediction, and achieved values of 0.810 when used as a prediction of blood-brain barrier permeability, 0.694 for P-glycoprotein substrate, and 0.673 for estrogen receptor binding activity. These results expose a significant improvement over the currently existing methods, whose best performances were 0.628, 0.591, and 0.647 respectively. It was demonstrated that the integration of semantic similarity is a feasible and effective way to improve existing chemical compound classification systems. Among other possible uses, this tool helps the study of the evolution of metabolic pathways, the study of the correlation of metabolic networks with properties of those networks, or the improvement of ontologies that represent chemical information
    corecore