5,083 research outputs found

    Analysis of the human diseasome reveals phenotype modules across common, genetic, and infectious diseases

    Get PDF
    Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text- mining approach to identify the phenotypes (signs and symptoms) associated with over 8,000 diseases. We demonstrate that our method generates phenotypes that correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that share signs and symptoms cluster together, and we use this network to identify phenotypic disease modules

    Ontology ranking based on the analysis of concept structures

    Get PDF
    In view of the need to provide tools to facilitate the reuse of existing knowledge structures such as ontologies, we present in this paper a system, AKTiveRank, for the ranking of ontologies. AKTiveRank uses as input the search terms provided by a knowledge engineer and, using the output of an ontology search engine, ranks the ontologies. We apply a number of classical metrics in an attempt to investigate their appropriateness for ranking ontologies, and compare the results with a questionnaire-based human study. Our results show that AKTiveRank will have great utility although there is potential for improvement

    Pragmatic Ontology Evolution: Reconciling User Requirements and Application Performance

    Get PDF
    Increasingly, organizations are adopting ontologies to describe their large catalogues of items. These ontologies need to evolve regularly in response to changes in the domain and the emergence of new requirements. An important step of this process is the selection of candidate concepts to include in the new version of the ontology. This operation needs to take into account a variety of factors and in particular reconcile user requirements and application performance. Current ontology evolution methods focus either on ranking concepts according to their relevance or on preserving compatibility with existing applications. However, they do not take in consideration the impact of the ontology evolution process on the performance of computational tasks ā€“ e.g., in this work we focus on instance tagging, similarity computation, generation of recommendations, and data clustering. In this paper, we propose the Pragmatic Ontology Evolution (POE) framework, a novel approach for selecting from a group of candidates a set of concepts able to produce a new version of a given ontology that i) is consistent with the a set of user requirements (e.g., max number of concepts in the ontology), ii) is parametrised with respect to a number of dimensions (e.g., topological considerations), and iii) effectively supports relevant computational tasks. Our approach also supports users in navigating the space of possible solutions by showing how certain choices, such as limiting the number of concepts or privileging trendy concepts rather than historical ones, would reflect on the application performance. An evaluation of POE on the real-world scenario of the evolving Springer Nature taxonomy for editorial classification yielded excellent results, demonstrating a significant improvement over alternative approaches

    htsint: a Python library for sequencing pipelines that combines data through gene set generation

    Get PDF
    Background: Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. Results: We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages. Conclusion: The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint

    The role of ontologies in biological and biomedical research: a functional perspective.

    Get PDF
    Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.This is the final version of the article. It first appeared from Oxford University Press via http://dx.doi.org/10.1093/bib/bbv01

    DynGO: a tool for visualizing and mining of Gene Ontology and its associations

    Get PDF
    BACKGROUND: A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO) as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations). Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms). Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. RESULTS: We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO). DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. CONCLUSION: We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete documentation and software are freely available for download from the website

    Benchmarking some Portuguese S&T system research units: 2nd Edition

    Full text link
    The increasing use of productivity and impact metrics for evaluation and comparison, not only of individual researchers but also of institutions, universities and even countries, has prompted the development of bibliometrics. Currently, metrics are becoming widely accepted as an easy and balanced way to assist the peer review and evaluation of scientists and/or research units, provided they have adequate precision and recall. This paper presents a benchmarking study of a selected list of representative Portuguese research units, based on a fairly complete set of parameters: bibliometric parameters, number of competitive projects and number of PhDs produced. The study aimed at collecting productivity and impact data from the selected research units in comparable conditions i.e., using objective metrics based on public information, retrievable on-line and/or from official sources and thus verifiable and repeatable. The study has thus focused on the activity of the 2003-06 period, where such data was available from the latest official evaluation. The main advantage of our study was the application of automatic tools, achieving relevant results at a reduced cost. Moreover, the results over the selected units suggest that this kind of analyses will be very useful to benchmark scientific productivity and impact, and assist peer review.Comment: 26 pages, 20 figures F. Couto, D. Faria, B. Tavares, P. Gon\c{c}alves, and P. Verissimo, Benchmarking some portuguese S\&T system research units: 2nd edition, DI/FCUL TR 13-03, Department of Informatics, University of Lisbon, February 201

    FunSimMat update: new features for exploring functional similarity

    Get PDF
    Quantifying the functional similarity of genes and their products based on Gene Ontology annotation is an important tool for diverse applications like the analysis of gene expression data, the prediction and validation of protein functions and interactions, and the prioritization of disease genes. The Functional Similarity Matrix (FunSimMat, http://www.funsimmat.de) is a comprehensive database providing various precomputed functional similarity values for proteins in UniProtKB and for protein families in Pfam and SMART. With this update, we significantly increase the coverage of FunSimMat by adding data from the Gene Ontology Annotation project as well as new functional similarity measures. The applicability of the database is greatly extended by the implementation of a new Gene Ontology-based method for disease gene prioritization. Two new visualization tools allow an interactive analysis of the functional relationships between proteins or protein families. This is enhanced further by the introduction of an automatically derived hierarchy of annotation classes. Additional changes include a revised user front-end and a new RESTlike interface for improving the user-friendliness and online accessibility of FunSimMat

    Behavior change interventions: the potential of ontologies for advancing science and practice

    Get PDF
    A central goal of behavioral medicine is the creation of evidence-based interventions for promoting behavior change. Scientific knowledge about behavior change could be more effectively accumulated using "ontologies." In information science, an ontology is a systematic method for articulating a "controlled vocabulary" of agreed-upon terms and their inter-relationships. It involves three core elements: (1) a controlled vocabulary specifying and defining existing classes; (2) specification of the inter-relationships between classes; and (3) codification in a computer-readable format to enable knowledge generation, organization, reuse, integration, and analysis. This paper introduces ontologies, provides a review of current efforts to create ontologies related to behavior change interventions and suggests future work. This paper was written by behavioral medicine and information science experts and was developed in partnership between the Society of Behavioral Medicine's Technology Special Interest Group (SIG) and the Theories and Techniques of Behavior Change Interventions SIG. In recent years significant progress has been made in the foundational work needed to develop ontologies of behavior change. Ontologies of behavior change could facilitate a transformation of behavioral science from a field in which data from different experiments are siloed into one in which data across experiments could be compared and/or integrated. This could facilitate new approaches to hypothesis generation and knowledge discovery in behavioral science
    • ā€¦
    corecore