33 research outputs found
GO-WORDS: An Entropic Approach to Semantic Decomposition of Gene Ontology Terms
The Gene Ontology (GO) has a large and growing number of terms that constitute its vocabulary. An entropy-based approach is presented to automate the characterization of the compositional semantics of GO terms. The motivation is to extend the machine-readability of GO and to offer insights for the continued maintenance and growth of GO. A proto-type implementation illustrates the benefits of the approach
Transforming the Axiomisation of Ontologies: The Ontology Pre-Processor Language
As ontologies are developed there is a common need to transform them, especially from those that are axiomatically lean to those that are axiomatically rich. Such transformations often require large numbers of axioms to be generated that affect many different parts of the ontology. This paper describes the Ontology Pre-Processor Language (OPPL), a domain-specific macro language, based in the Manchester OWL Syntax, for manipulating ontologies written in OWL. OPPL instructions can add/remove entities, and add/remove axioms (semantics or annotations) to/from entities in an OWL ontology. OPPL is suitable for applying the same change to different ontologies or at different development stages, and for keeping track of the changes made (e.g. in pipelines). It is also suitable for defining independent modelling macros (e.g. Ontology Design Patterns) that can be applied at will and systematically across an ontology. The presented OPPL Instruction Manager is a Java library that processes OPPL instructions making the changes to an OWL ontology. A reference implementation that uses the OPPL Instruction Manager is also presented. The use of OPPL has been demonstrated in the Cell Cycle Ontolog
Dependence relationships between Gene Ontology terms based on TIGR gene product annotations
The Gene Ontology is an important tool for the representation and processing of information about gene products and functions. It provides controlled vocabularies for the designations of cellular components, molecular functions, and biological processes used in the annotation of genes and gene products. These constitute
three separate ontologies, of cellular components), molecular functions and biological processes, respectively. The question we address here is: how are the terms in these three separate ontologies related to each other? We use statistical methods and formal ontological principles as a first step towards finding answers to this question
Towards a proteomics meta-classification
that can serve as a foundation for more refined ontologies in the field of proteomics. Standard data sources classify proteins in terms of just one or two specific aspects. Thus SCOP (Structural Classification of Proteins) is described as classifying proteins on the basis of structural features; SWISSPROT annotates proteins on the basis of their structure and of parameters like post-translational modifications. Such data sources are connected to each other by pairwise term-to-term mappings. However, there are obstacles which stand in the way of combining them together to form a robust meta-classification of the needed sort. We discuss some formal ontological principles which
should be taken into account within the existing datasources in order to make such a metaclassification possible, taking into account also the Gene Ontology (GO) and its application to the annotation of proteins
On Carcinomas and Other Pathological Entities
Tumours, abscesses, cysts, scars and fractures are familiar types of what we shall
call pathological continuant entities. The instances of such types exist always in
or on anatomical structures, which thereby become transformed into
pathological anatomical structures of corresponding types: a fractured tibia, a blistered thumb, a
carcinomatous colon. In previous work on biomedical ontologies we showed how the provision
of formal definitions for relations such as is_a, part_of and transformation_of
can facilitate the integration of such ontologies in ways which have the potential
to support new kinds of automated reasoning. We here extend this approach to the
treatment of pathologies, focusing especially on those pathological continuant entities
which arise when organs become affected by carcinomas
Obol: Integrating Language and Meaning in Bio-Ontologies
Ontologies are intended to capture and formalize a domain of knowledge. The
ontologies comprising the Open Biological Ontologies (OBO) project, which includes
the Gene Ontology (GO), are formalizations of various domains of biological
knowledge. Ontologies within OBO typically lack computable definitions that serve to
differentiate a term from other similar terms. The computer is unable to determine the
meaning of a term, which presents problems for tools such as automated reasoners.
Reasoners can be of enormous benefit in managing a complex ontology. OBO term
names frequently implicitly encode the kind of definitions that can be used by
computational tools, such as automated reasoners. The definitions encoded in the
names are not easily amenable to computation, because the names are ostensibly
natural language phrases designed for human users. These names are highly regular
in their grammar, and can thus be treated as valid sentences in some formal or
computable language.With a description of the rules underlying this formal language,
term names can be parsed to derive computable definitions, which can then be
reasoned over. This paper describes the effort to elucidate that language, called Obol,
and the attempts to reason over the resulting definitions. The current implementation
finds unique non-trivial definitions for around half of the terms in the GO, and
has been used to find 223 missing relationships, which have since been added to
the ontology. Obol has utility as an ontology maintenance tool, and as a means of
generating computable definitions for a whole ontology
Instance-Based Matching of Large Life Science Ontologies
Ontologies are heavily used in life sciences so that there is increasing value to match different ontologies in order to determine related conceptual categories. We propose a simple yet powerful methodology for instance-based ontology matching which utilizes the associations between molecular-biological objects and ontologies. The approach can build on many existing ontology associations for instance objects like sequences and proteins and thus makes heavy use of available domain knowledge. Furthermore, the approach is flexible and extensible since each instance source with associations to the ontologies of interest can contribute to the ontology mapping. We study several approaches to determine the instance-based similarity of ontology categories. We perform an extensive experimental evaluation to use protein associations for different species to match between subontologies of the Gene Ontology and OMIM. We also provide a comparison with metadata-based ontology matching
An analysis of gene/protein associations at PubMed scale
<p>Abstract</p> <p>Background</p> <p>Event extraction following the GENIA Event corpus and BioNLP shared task models has been a considerable focus of recent work in biomedical information extraction. This work includes efforts applying event extraction methods to the entire PubMed literature database, far beyond the narrow subdomains of biomedicine for which annotated resources for extraction method development are available.</p> <p>Results</p> <p>In the present study, our aim is to estimate the coverage of all statements of gene/protein associations in PubMed that existing resources for event extraction can provide. We base our analysis on a recently released corpus automatically annotated for gene/protein entities and syntactic analyses covering the entire PubMed, and use named entity co-occurrence, shortest dependency paths and an unlexicalized classifier to identify likely statements of gene/protein associations. A set of high-frequency/high-likelihood association statements are then manually analyzed with reference to the GENIA ontology.</p> <p>Conclusions</p> <p>We present a first estimate of the overall coverage of gene/protein associations provided by existing resources for event extraction. Our results suggest that for event-type associations this coverage may be over 90%. We also identify several biologically significant associations of genes and proteins that are not addressed by these resources, suggesting directions for further extension of extraction coverage.</p