12 research outputs found

    GO faster ChEBI with Reasonable Biochemistry

    Get PDF
    Chemical Entities of Biological Interest (ChEBI) is a database and ontology that represents biochemical knowledge about small molecules. Recent changes to the ontology have created new opportunities for automated reasoning with description logic, that have not previously been fully exploited in Chemistry. These changes open up the possibility of building an improved chemical semantic web, by making more use of necessary and sufficient conditions, allowing reasoning about chemical structure, highlighting ambiguous inconsistencies and improving alignment with the Gene Ontology (GO). This paper briefly discusses some of the problems with reasoning over the current version of ChEBI, to tackle these issues, and their potential solutions

    The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration

    Get PDF
    The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or ‘ontologies’. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium has set in train a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing a process of coordinated reform, and new ontologies being created, on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable, logically well-formed, and to incorporate accurate representations of biological reality. We describe the OBO Foundry initiative, and provide guidelines for those who might wish to become involved in the future

    Statistical tests for associations between two directed acyclic graphs.

    Get PDF
    Biological data, and particularly annotation data, are increasingly being represented in directed acyclic graphs (DAGs). However, while relevant biological information is implicit in the links between multiple domains, annotations from these different domains are usually represented in distinct, unconnected DAGs, making links between the domains represented difficult to determine. We develop a novel family of general statistical tests for the discovery of strong associations between two directed acyclic graphs. Our method takes the topology of the input graphs and the specificity and relevance of associations between nodes into consideration. We apply our method to the extraction of associations between biomedical ontologies in an extensive use-case. Through a manual and an automatic evaluation, we show that our tests discover biologically relevant relations. The suite of statistical tests we develop for this purpose is implemented and freely available for download

    Informatics Approaches to Linking Mutations to Biological Pathways, Networks and Clinical Data

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)The information gained from sequencing of the human genome has begun to transform human biology and genetic medicine. The discovery of functionally important genetic variation lies at the heart of these endeavors, and there has been substantial progress in understanding the common patterns of single-nucleotide polymorphism (SNP) in humans- the most frequent type of variation in humans. Although more than 99% of human DNA sequences are the same across the population, variations in DNA sequence have a major impact on how we humans respond to disease; to environmental entities such as bacteria, viruses, toxins, and chemicals; and drugs and other therapies and thus studying differences between our genomes is vital. This makes SNPs as well other genetic variation data of great value for biomedical research and for developing pharmaceutical products or medical diagnostics. The goal of the project is to link genetic variation data to biological pathways and networks data, and also to clinical data for creating a framework for translational and systems biology studies. The study of the interactions between the components of biological systems and biological pathways has become increasingly important. It is known and accepted by scientists that it as important to study different biological entities as interacting systems, as in isolation. This project has ideas rooted in this thinking aiming at the integration of a genetic variation dataset with biological pathways dataset. Annotating genetic variation data with standardized disease notation is a very difficult yet important endeavor. One of the goals of this research is to identify whether informatics approaches can be applied to automatically annotate genetic variation data with a classification of diseases

    Identification of OBO nonalignments and its implications for OBO enrichment

    Get PDF
    Motivation: Existing projects that focus on the semiautomatic addition of links between existing terms in the Open Biomedical Ontologies can take advantage of reasoners that can make new inferences between terms that are based on the added formal definitions and that reflect nonalignments between the linked terms. However, these projects require that these definitions be necessary and sufficient, a strong requirement that often does not hold. If such definitions cannot be added, the reasoners cannot point to the nonalignments through the suggestion of new inferences

    Advancing translational research with the Semantic Web

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A fundamental goal of the U.S. National Institute of Health (NIH) "Roadmap" is to strengthen <it>Translational Research</it>, defined as the movement of discoveries in basic research to application at the clinical level. A significant barrier to translational research is the lack of uniformly structured data across related biomedical domains. The Semantic Web is an extension of the current Web that enables navigation and meaningful use of digital resources by automatic processes. It is based on common formats that support aggregation and integration of data drawn from diverse sources. A variety of technologies have been built on this foundation that, together, support identifying, representing, and reasoning across a wide range of biomedical data. The Semantic Web Health Care and Life Sciences Interest Group (HCLSIG), set up within the framework of the World Wide Web Consortium, was launched to explore the application of these technologies in a variety of areas. Subgroups focus on making biomedical data available in RDF, working with biomedical ontologies, prototyping clinical decision support systems, working on drug safety and efficacy communication, and supporting disease researchers navigating and annotating the large amount of potentially relevant literature.</p> <p>Results</p> <p>We present a scenario that shows the value of the information environment the Semantic Web can support for aiding neuroscience researchers. We then report on several projects by members of the HCLSIG, in the process illustrating the range of Semantic Web technologies that have applications in areas of biomedicine.</p> <p>Conclusion</p> <p>Semantic Web technologies present both promise and challenges. Current tools and standards are already adequate to implement components of the bench-to-bedside vision. On the other hand, these technologies are young. Gaps in standards and implementations still exist and adoption is limited by typical problems with early technology, such as the need for a critical mass of practitioners and installed base, and growing pains as the technology is scaled up. Still, the potential of interoperable knowledge sources for biomedicine, at the scale of the World Wide Web, merits continued work.</p

    Ontologies in bioinformatics and systems biology

    Get PDF
    Computer simulation is now becoming a central scientific paradigm of systems biology and the basic tool for the theoretical study and understanding of the complex mechanisms of living systems. The increase in the number and complexity of these models leads to the need for their collaborative development, reuse of models, and their verification, and the description of the computational experiment and its results. Ontological modeling is used to develop formats for knowledge-oriented mathematical modeling of biological systems. In this sense, ontology associated with the entire set of formats, supporting research in systems biology, in particular, computer modeling of biological systems and processes can be regarded as a first approximation to the ontology of systems biology. This review summarizes the features of the subject area (bioinformatics, systems biology, and biomedicine), the main motivation for the development of ontologies and the most important examples of ontological modeling and semantic analysis at different levels of the hierarchy of knowledge: the molecular genetic level, cellular level, tissue levels of organs and the body. Bioinformatics and systems biology is an excellent ground for testing technologies and efficient use of ontological modeling. Several dozens of verified basic reference ontologies now represent a source of knowledge for the integration and development of more complex domain models aimed at addressing specific issues in biomedicine and biotechnology. Further formalization and ontological accumulation of knowledge and the use of formal methods of analysis can take the entire cycle of research in systems biology to a new technological level

    Large Scale Data Analytics with Language Integrated Query

    Get PDF
    Databases can easily reach petabytes (1,048,576 gigabytes) in scale. A system to enable users to efficiently retrieve or query data from multiple databases simultaneously is needed. This research introduces a new, cloud-based query framework, designed and built using Language Integrated Query, to query existing data sources without the need to integrate or restructure existing databases. Protein data obtained through the query framework proves its feasibility and cost effectiveness
    corecore