3,986 research outputs found

    Computer-assisted curation of a human regulatory core network from the biological literature

    Get PDF
    Motivation: A highly interlinked network of transcription factors (TFs) orchestrates the context-dependent expression of human genes. ChIP-chip experiments that interrogate the binding of particular TFs to genomic regions are used to reconstruct gene regulatory networks at genome-scale, but are plagued by high false-positive rates. Meanwhile, a large body of knowledge on high-quality regulatory interactions remains largely unexplored, as it is available only in natural language descriptions scattered over millions of scientific publications. Such data are hard to extract and regulatory data currently contain together only 503 regulatory relations between human TFs. Results: We developed a text-mining-assisted workflow to systematically extract knowledge about regulatory interactions between human TFs from the biological literature. We applied this workflow to the entire Medline, which helped us to identify more than 45 000 sentences potentially describing such relationships. We ranked these sentences by a machine-learning approach. The top-2500 sentences contained ∼900 sentences that encompass relations already known in databases. By manually curating the remaining 1625 top-ranking sentences, we obtained more than 300 validated regulatory relationships that were not present in a regulatory database before. Full-text curation allowed us to obtain detailed information on the strength of experimental evidences supporting a relationship. Conclusions: We were able to increase curated information about the human core transcriptional network by >60% compared with the current content of regulatory databases. We observed improved performance when using the network for disease gene prioritization compared with the state-of-the-art. Availability and implementation: Web-service is freely accessible athttp://fastforward.sys-bio.net/.FWN – Publicaties zonder aanstelling Universiteit Leide

    Text mining for biology - the way forward: opinions from leading scientists

    Get PDF
    This article collects opinions from leading scientists about how text mining can provide better access to the biological literature, how the scientific community can help with this process, what the next steps are, and what role future BioCreative evaluations can play. The responses identify several broad themes, including the possibility of fusing literature and biological databases through text mining; the need for user interfaces tailored to different classes of users and supporting community-based annotation; the importance of scaling text mining technology and inserting it into larger workflows; and suggestions for additional challenge evaluations, new applications, and additional resources needed to make progress

    RegenBase: a knowledge base of spinal cord injury biology for translational research.

    Get PDF
    Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. To address these challenges, we have developed RegenBase, a knowledge base of SCI biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources. By querying RegenBase, we have identified novel biological hypotheses linking the effects of perturbagens to observed behavioral outcomes after SCI. RegenBase is publicly available for browsing, querying and download.Database URL:http://regenbase.org

    A Bioinformatics-Assisted Review on Iron Metabolism and Immune System to Identify Potential Biomarkers of Exercise Stress-Induced Immunosuppression

    Get PDF
    The immune function is closely related to iron (Fe) homeostasis and allostasis. The aim of this bioinformatics-assisted review was twofold; (i) to update the current knowledge of Fe metabolism and its relationship to the immune system, and (ii) to perform a prediction analysis of regulatory network hubs that might serve as potential biomarkers during stress-induced immunosuppression. Several literature and bioinformatics databases/repositories were utilized to review Fe metabolism and complement the molecular description of prioritized proteins. The Search Tool for the Retrieval of Interacting Genes (STRING) was used to build a protein-protein interactions network for subsequent network topology analysis. Importantly, Fe is a sensitive double-edged sword where two extremes of its nutritional status may have harmful effects on innate and adaptive immunity. We identified clearly connected important hubs that belong to two clusters: (i) presentation of peptide antigens to the immune system with the involvement of redox reactions of Fe, heme, and Fe trafficking/transport; and (ii) ubiquitination, endocytosis, and degradation processes of proteins related to Fe metabolism in immune cells (e.g., macrophages). The identified potential biomarkers were in agreement with the current experimental evidence, are included in several immunological/biomarkers databases, and/or are emerging genetic markers for different stressful conditions. Although further validation is warranted, this hybrid method (human-machine collaboration) to extract meaningful biological applications using available data in literature and bioinformatics tools should be highlighted.The ‘Bioinformatics-assisted Review’ is a project developed and supported by the Research Division at the Dynamical Business and Science Society—DBSS International SAS. The APC was funded by the Exercise & Sport Nutrition Laboratory (ESNL) at Texas A&M University, the POWER LAB at University of Central Florida and the Sport Genomics Research Group at University of the Basque Country UPV/EHU

    PIRSF Family Classification System for Protein Functional and Evolutionary Analysis

    Get PDF
    The PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families

    Path2Models: large-scale generation of computational models from biochemical pathway maps

    Get PDF
    Background: Systems biology projects and omics technologies have led to a growing number of biochemical pathway models and reconstructions. However, the majority of these models are still created de novo, based on literature mining and the manual processing of pathway data. Results: To increase the efficiency of model creation, the Path2Models project has automatically generated mathematical models from pathway representations using a suite of freely available software. Data sources include KEGG, BioCarta, MetaCyc and SABIO-RK. Depending on the source data, three types of models are provided: kinetic, logical and constraint-based. Models from over 2 600 organisms are encoded consistently in SBML, and are made freely available through BioModels Database at http://www.ebi.ac.uk/biomodels-main/path2models. Each model contains the list of participants, their interactions, the relevant mathematical constructs, and initial parameter values. Most models are also available as easy-to-understand graphical SBGN maps. Conclusions: To date, the project has resulted in more than 140 000 freely available models. Such a resource can tremendously accelerate the development of mathematical models by providing initial starting models for simulation and analysis, which can be subsequently curated and further parameterized
    corecore