1,874 research outputs found

    Improvement of KiMoSys framework for kinetic modelling

    Get PDF
    Over the past years, an increasing amount of biological data produced shows the impor tance of data repositories. The databases ensure an easier way to reuse and share research data between the scientific community. Among the most important features are the quick access to data, described by metadata and available in standard formats, and the compli ance with the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles for data management. KiMoSys (https://kimosys.org) is a public domain-specific repository of experi mental data, containing concentration data of enzymes, metabolites and flux data. It offers a web-based interface and upload facility to publish data, making it accessible in standard formats, while also integrating kinetic models related to the data. This thesis is a contribution to the improvement and extension of KiMoSys. It includes the addition of more downloadable data formats, the introduction of data visualization, the incorporation of more tools to filter data, the integration of a simulation environment for kinetic models and the inclusion of a unique persistent identifier system. As a result, it is provided a new version of KiMoSys, with a renewed interface, mul tiple new features, and an enhancement of the previously existing ones. These are in accordance with all FAIR data principles. Therefore, it is believed that KiMoSys v2.0 will be an important tool for the systems biology modeling community.Nos últimos anos, uma quantidade crescente de dados biológicos produzidos atesta a importância dos repositórios de dados. As bases de dados garantem uma maneira mais fácil de reutilizar e partilhar dados de pesquisa entre a comunidade científica. Entre as características mais importantes estão o rápido acesso aos dados, descritos por metada dos e disponíveis em formatos padrão, e o cumprimento dos Princípios FAIR (Findable, Accessible, Interoperable e Reusable) para a gestão de dados. KiMoSys (https://kimosys.org) consiste num repositório público de domínio espe cífico de dados experimentais, contendo dados de concentração de enzimas, metabolitos e dados de fluxo. Oferece uma interface para a web e uma ferramenta de carregamento de dados, tornando-os acessíveis em formatos padrão, além de integrar modelos cinéticos relacionados aos dados. Esta tese contribui para o melhoramento e extensão do KiMoSys. Inclui a adição de mais formatos de dados para descarga, a introdução de visualização de dados, a incorpo ração de mais opções para filtrar os dados, a integração de um ambiente de simulação para modelos cinéticos e a inclusão de um sistema de identificador único persistente. Como resultado, é apresentada uma nova versão do KiMoSys, com uma interface renovada, várias novas características e um aprimoramento das anteriormente existentes. Estas estão de acordo com todos os princípios de dados FAIR. Portanto, acredita-se que o KiMoSys v2.0 será uma ferramenta importante para a comunidade de modelagem de sistemas biológicos

    Classification of the Universe of Immune Epitope Literature: Representation and Knowledge Gaps

    Get PDF
    A significant fraction of the more than 18 million scientific articles currently indexed in the PubMed database are related to immune responses to various agents, including infectious microbes, autoantigens, allergens, transplants, cancer antigens and others. The Immune Epitope Database (IEDB) is an online repository that catalogs immune epitope reactivity data derived from articles listed in the National Library of Medicine PubMed database. The IEDB is maintained and continually updated by monitoring PubMed for new, potentially relevant references.Herein we detail the classification of all epitope-specific literature in over 100 different immunological domains representing Infectious Diseases and Microbes, Autoimmunity, Allergy, Transplantation and Cancer. The relative number of references in each category reflects past and present areas of research on immune reactivities. In addition to describing the overall landscape of data distribution, this particular characterization of the epitope reference data also allows for the exploration of possible correlations with global disease morbidity and mortality data.While in most cases diseases associated with high morbidity and mortality rates were amongst the most studied, a number of high impact diseases such as dengue, Schistosoma, HSV-2, B. pertussis and Chlamydia trachoma, were found to have very little coverage. The data analyzed in this fashion represents the first estimate of how reported immunological data corresponds to disease-related morbidity and mortality, and confirms significant discrepancies in the overall research foci versus disease burden, thus identifying important gaps to be pursued by future research. These findings may also provide a justification for redirecting a portion of research funds into some of the underfunded, critical disease areas

    Mining the Gene Wiki for functional genomic knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology.</p> <p>Results</p> <p>Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses.</p> <p>Conclusions</p> <p>The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.</p

    Predicting Gene Ontology Annotations Based on Literature Co-Occurrence

    Get PDF
    In recent years, the amount of digital data that we produce has increased exponentially. This flood of information, often referred to as “big data,” is creating both opportunities and challenges in all areas of life. In the domain of biology, technology has enabled us to sequence the genomes of humans and many other organisms, but we are far from understanding the biological roles played by all of these genes. The Gene Ontology seeks to address this problem by annotating genes to terms describing biological processes, molecular functions, and cellular components. However, the ontology’s manual curators cannot keep up with the rate at which information is being discovered and published. Hence, there is a need for computational methods that can rapidly process the biomedical literature and suggest new annotations for verification. This study uses support vector machines to predict Gene Ontology annotations for Saccharomyces cerevisiae (yeast). I tested the usefulness of two types of literature features: co-occurrence of gene names in articles, and co-occurrence in abstracts of gene names with keywords taken from GO term definitions. My results demonstrate that support vector machines using literature co-occurrence data as features can predict GO annotations with high accuracy. In many cases where simple gene-gene co-occurrence does not work well, better results can be obtained using gene-keyword co-occurrence. I found that a very simple text mining strategy — identifying words that occur in only one GO term definition — was an effective way of choosing keywords. Although predictions based on gene-gene co-occurrence and those based on gene-keyword co-occurrence were highly correlated, there are terms for which one set of predictions was significantly more accurate than the other. I was able to combine the two sets of predictions effectively using a voting scheme in which gene-gene predictions were weighted at 70% and gene-keyword predictions at 30%

    The Symbiosis Interactome: a computational approach reveals novel components, functional interactions and modules in Sinorhizobium meliloti

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Rhizobium</it>-Legume symbiosis is an attractive biological process that has been studied for decades because of its importance in agriculture. However, this system has undergone extensive study and although many of the major factors underpinning the process have been discovered using traditional methods, much remains to be discovered.</p> <p>Results</p> <p>Here we present an analysis of the 'Symbiosis Interactome' using novel computational methods in order to address the complex dynamic interactions between proteins involved in the symbiosis of the model bacteria <it>Sinorhizobium meliloti </it>with its plant hosts. Our study constitutes the first large-scale analysis attempting to reconstruct this complex biological process, and to identify novel proteins involved in establishing symbiosis. We identified 263 novel proteins potentially associated with the Symbiosis Interactome. The topology of the Symbiosis Interactome was used to guide experimental techniques attempting to validate novel proteins involved in different stages of symbiosis. The contribution of a set of novel proteins was tested analyzing the symbiotic properties of several <it>S. meliloti </it>mutants. We found mutants with altered symbiotic phenotypes suggesting novel proteins that provide key complementary roles for symbiosis.</p> <p>Conclusion</p> <p>Our 'systems-based model' represents a novel framework for studying host-microbe interactions, provides a theoretical basis for further experimental validations, and can also be applied to the study of other complex processes such as diseases.</p

    The Resource Identification Initiative: A cultural shift in publishing

    Get PDF
    A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to allow humans and algorithms to identify the exact resources that are reported or answer basic questions such as What other studies used resource X? To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (including software and databases). RRIDs represent accession numbers assigned by an authoritative database, e.g., the model organism databases, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal ( www.scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are generally accurate in performing the task of identifying resources and supportive of the goals of the project. We also show that identifiability of the resources pre- and post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources

    Provenance, propagation and quality of biological annotation

    Get PDF
    PhD ThesisBiological databases have become an integral part of the life sciences, being used to store, organise and share ever-increasing quantities and types of data. Biological databases are typically centred around raw data, with individual entries being assigned to a single piece of biological data, such as a DNA sequence. Although essential, a reader can obtain little information from the raw data alone. Therefore, many databases aim to supplement their entries with annotation, allowing the current knowledge about the underlying data to be conveyed to a reader. Although annotations come in many di erent forms, most databases provide some form of free text annotation. Given that annotations can form the foundations of future work, it is important that a user is able to evaluate the quality and correctness of an annotation. However, this is rarely straightforward. The amount of annotation, and the way in which it is curated, varies between databases. For example, the production of an annotation in some databases is entirely automated, without any manual intervention. Further, sections of annotations may be reused, being propagated between entries and, potentially, external databases. This provenance and curation information is not always apparent to a user. The work described within this thesis explores issues relating to biological annotation quality. While the most valuable annotation is often contained within free text, its lack of structure makes it hard to assess. Initially, this work describes a generic approach that allows textual annotations to be quantitatively measured. This approach is based upon the application of Zipf's Law to words within textual annotation, resulting in a single value, . The relationship between the value and Zipf's principle of least e ort provides an indication as to the annotations quality, whilst also allowing annotations to be quantitatively compared. Secondly, the thesis focuses on determining annotation provenance and tracking any subsequent propagation. This is achieved through the development of a visualisation - i - framework, which exploits the reuse of sentences within annotations. Utilising this framework a number of propagation patterns were identi ed, which on analysis appear to indicate low quality and erroneous annotation. Together, these approaches increase our understanding in the textual characteristics of biological annotation, and suggests that this understanding can be used to increase the overall quality of these resources
    corecore